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ABSTRACT 

A new algorithm for solving subset regression 
problems is described. The algorithm performs a QR 
decomposition with a new column-pivoting strategy, 
which permits subset selection directly from the 
originally defined regression parameters. This, in 
combination with a number of extensions of the new 
technique, makes the method a very flexible tool 
for analyzing subset regression problems in which 
the parameters have a physical meaning. 


INTRODUCTION 

The subset regression problem analyzed in this 
paper is formulated for the following least-squares 
problem: 

min| | Ax - b| | 2 (1) 

x 

The system matrix A e R mXn (m > n) is assumed to 
contain a number of columns that are "nearly" lin- 
early dependent on an independent set of columns 
of A. The task now is to find the minimal number 
of columns comprising that independent set. Fur- 
thermore, the adjective "nearly" is used to indi- 
cate that small perturbations of A, that is, of 
the order of the inaccuracies (0(c)) on the entries 
of A, establish that linear dependency. 

A numerically reliable technique used so far 
to detect near dependencies is that of singular- 
value decomposition (SVD) [1], This technique 
determines a minimal set of columns. Say that the 
dimension of this set is k. But, each of these 
selected columns now is a linear combination of the 
original columns of A. Hence, that minimal set 
will generally contain more than k individual 
columns of A. This is a major drawback for prac- 
tical subset regression problems, such as aerody- 
namic model identification [2], [3]. 


This paper describes a new technique that 
allows the direct selection of k individual col- 
umns of A. The technique performs a QR decomposi- 
tion with a new column -pi voting strategy. Further- 
more, it is not necessary to compute the full SVD 
of A in order to determine k. This is because 
accurate estimates of the singular values, which 
allow such a determination, are also obtained. 

This new technique is described first, and 
then a number of extensions and modifications of it 
are presented that allow its application to practi- 
cal problems in subset regression analysis. These 
problems are in the order to be discussed: 

1. The determination of the interrelationship 
between the defined independent and dependent col- 
umns of A by the new technique. 

2. The inclusion of a priori information 
about the structure of the regression model, as 
well as about estimates of the individual compo- 
nents of x in ( 1 ) and their corresponding 
uncertainties. 

3. The capability to jointly process columns 
of A that are contaminated by noise and others 
that are noise-free without using scaling 
techniques. 

4. The efficient solution of the closely 
related total least-squares problem, described in 
[4). 


NEW COLUMN-PIVOTING STRATEGY IN QR DECOMPOSITION 

The QR decomposition of the matrix A in ( 1 ) 
is denoted as 



where R e R nxn is upper triangular and 
Q o r Q o = I . This transformation differs from the 
SVD in tnat no right orthogonal transformation on 
A is applied. It is precisely this right- 
transformation that obscures the selection of k 
individual columns of A. To avoid this problem, 
the only right-transformation on A allowed is a 
column permutation. 
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The idea of column pivoting is to exploit the 
freedom Introduced by this column permutation to 
solve the rank-deficient least-squares problem 
(1). Based on the new pivoting strategy, it will 
be shown that this can be done as reliably as with 
the SVD method without relying on complete SVD. 

Let us clarify this for the rank-one deficient 
case, that is, for k = n - 1, implying that 
o n = 0(c) and o n-1 >> o n . (For the sake of 
brevity, we restrict without loss of generality to 
this case throughout the whole paper.) Here the 
singular values of A are ordered in 

decreasing magnitude. 


From [5] (for example) it is known that the 
magnitude of the last diagonal element of R in 
(2) is an upper bound for o n . Therefore, the 
rank-one deficiency can be revealed with column 
pivoting if we can find a column permutation 
matrix *_ such that 


Q> n =|— KM (3) 


■ 

- 

e»n 

n 

R 11 

r 12 

0 

-22 _ 


where rljj is "small 1 * of 0(c). The existence 
of such a permutation is indicated by exercise 
P-6-4-4 of [6). 


Furthermore this exercise gives a constructive 
way to find this permutation. The key information 
is the computation of the right singular vector 
corresponding to the smallest singular value <? n . 
This can be done, for example, by the inverse 
iteration method [6]. 


Sb ' R n-i ■ Rk 



( 6 ) 


(7) 


( 8 ) 


where (A* n ) k denotes the k selected columns 
of A. 


DETERMINING INTERRELATIONSHIPS IN THE 
REGRESSION MODEL 

The above algorithm rearranges the columns 
of A as [a Cl ... a c k a c k +j ••• a c n )* Here the 
vectors (a c ^ ... a Ck ) correspond to the so-called 
identifiable or independent components of x, 
denoted previously as x b , and lac k+1 ••• a c n i 
corresponds to the dependent ones. 

For a number of applications it is necessary 
to specify these dependencies more precisely. In 
this case, it is necessary to know on which minimal 
set of vectors from [a Cl ... a Ck l each vector from 
[a Ck ^ ••• a c n l depends. 

The decomposition obtained in (4) reveals the 
information necessary to answer this question. 

First, focus on the following least-squares 
problem: 


The decomposition (3) combined with (2) 
produces: 


and 
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(5) 



* -1 

Elements of the solution fi = R ]j r i2 are the coor ” 

dinates of the projection of a Cn in the space 
spanned by [a Cl ... a^^]. course, "large" 

components in B suggest a strong correlation 
between the corresponding columns of 
(a ci ... a c n _il and a c n - However, it is more 
convenient to normalize these coordinates. Let us 
therefore define the following cosines: 


T 

Therefore, if we explicitly accumulate Q n (which 
can be done efficiently because we are working only 
in the n-dimensional parameter jpace as opposed to 
the sample space) and perform Q q with 
Householder transformations such as described in 

[7], the basic solution x fe , the residual v, and 

the estimate b of (1) can be computed in a manner 
that is completely similar to that in [7). These 
quantities can now be given as follows: 


B( i) | |r. | U 

cos 0. = *’ti n for i = 1 :k (10) 

1 I I r l2 1 '2 

with rj denoting the ith column of R-j-j in 
(9). This set of numbers ranges from -1 to >1 and, 
therefore, can be interpreted completely anal- 
ogously to the commonly used cross-correlation 
coefficients [5]. Normally, the latter are 



retrieved from the covariance matrix of the least- 
squares estimates of x in (1). Since A here is 
assumed to be rank-deficient, this covariance 
matrix cannot be computed unless use is made of 
pseudoinverses. 

Based on the following theorem we can however 
gain an understanding of this covariance matrix 
because of the inclusion of a dependent column. 


Theorem 1 : 


If an upper triangular matrix R e R nxn is 
given as 


R 1l | r 12 
0 | r 22 


with R^ of rank n-1 and r 22 arbitrary, then 


1 

T -1 * *T 

{R* 1 R 1 , ) ♦ a 0 6 

-a 0 

= 




-a 0 

a 


(R T R)‘ 


A - 1 2 
with B = R n r 12 and a = 1/r 22* 


(11) 


Proof : The proof of Theorem 1 could be obtained 
directly from the definition of the inverse of a 
matrix. 


For the decomposition obtained in (6), 

Theorem 1 allows us to exactly calculate the influ- 
ence of any r 22 on the estimated variances ^of 3^ 
(given by the diagonal elements of (R^R^)" ). 

When we make the practical observation that "a 
component of x is called not identifiable if its 
corresponding estimated variance becomes unaccepta- 
bly large , 11 then another procedure for determining 
the dependencies more precisely corresponds to 
determining whether an unacceptable increase in the 
variances of the components of occurs. 

Both of the procedures described above supply 
parameters and insights that have commonly been 
used to analyze ident if lability problems in practi- 
cal regression problems. 


INCLUSION OF A PRIORI INFORMATION 

For the physical applications addressed in 
this paper, information is often available about 
(1) the structure of the regression model, that is, 
which of the components of x in ( 1 ) have to be 
in x b , and (2) a priori estimates of individual 
components of x with corresponding variance. For 
the sake of brevity, only the capabilities of the 
new procedure in dealing with the first a priori 
information source are presented here. 


We can include that information after the 
decomposition given in (4) has been obtained. Here 
we would then use the cosines defined in (10) or 
the revealed influences of the dependent terms on 
the estimated variances of the independent terms 
of x. This information would allow us to inter- 
change columns of [a C j ... a c ^J with columns of 
[a c ... a c ]. Algorithmically, the necessary 

permutation can again be done efficiently, since we 
can remain working on the compressed matrix struc- 
ture defined in (4). Furthermore, information 
could be revealed about the effect of this inter- 
change by computing the change in residual v, 
efficiently computed as given in (7). 


JOINTLY PROCESSING NOISE-FREE AND NOISY 
COLUMNS IN A 

In many regression models there is a so-called 
offset term. This results in a column of A of 
all ones. The other columns might result from 
observations and are, therefore, contaminated by 
errors, often of different magnitudes for each 
column. A rough partitioning of the A-matrix in 
( 1 ) is therefore 



Noise-free Contaminated by errors 


The described algorithm requires, as the SVD, 
a threshold a to determine therank of A in ( 1 ) 
[5]. The use of such a single o requires column 
scaling [5], so that the magnitude of the errors on 
the columns of this scaled matrix is of the same 
order. The mixing as given in (12.) results in a 
very bad conditioning of this scaled matrix and, 
therefore, should be avoided. The new algorithm 
might handle the situation in (12) without scaling 
when modified in the following stages. 


Stage 1 : 


Process Aj by the algorithm given in the 
second section to produce 


R n 

R 12 

p 

= 0 

R 22 

R 13 


0 

R 23 


(13) 


This investigates the dependencies among the noise- 
free columns and, therefore, a threshold 
Oj = 0(machine precision) should be used. Before 
continuing to the second stage, check whether 
columns of A 2 are dependent on the columns 
represented in R^ in (13). This corresponds to 
checking whether 


||r 23 (i)|| 2 < o. /m (14) 


as outlined in [81 . Here £23(1) denoted the 
ith column of R 2 j in (13) and a L the corre- 
sponding standard deviation of the errors on that 
column. This procedure reduces ^13*^23 i* 1 03) 

to R 1 3»R 2 3> respectively. 

Stage 2 ; 

Apply the algorithm given in the second sec- 
tion to R 2 ^: 


flu 


0 

— » 

^ 22 _ 


using a threshold a. s 0 (magnitude of errors on 

a 2 ). 

Remark : This two-stage procedure might be extended 

to the following three-stage procedure, where A 
is partioned as 



Noise-free Mild errors Large errors 

( 16 ) 

EFFICIENT SOLUTION OF TOTAL LEAST-SQUARES PROBLEM 

This problem is not addressed here; the reader 
is referred to [8) for that discussion. 


CONCLUDING REMARKS 

A new scheme was presented for efficiently 
solving subset regression problems. The scheme is 
as reliable as the singular value decomposition 
technique, but does the subset selection directly 
from the defined set of regression parameters. 
Furthermore, a number of interesting extensions and 
modifications of this new procedure have been pre- 


sented that allow the flexible solution of subset 
regression problems where the regression parameters 
are of physical interest. 
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